Questions

  • In a STL decomposition, how do I pick the window size and the season size?
  • Is there a scientific process to determine outliers in Time Series Modeling?

TODOs

  • Understand how the the portmantaeu test, and what parameters to use for it. What p-values are acceptable?

In [ ]:
library(forecast)

In [ ]:
loadData <- function(dataFolder) {
    files <- list.files(dataFolder)
    data <- list()
    for(file in files) {    
        df <- read.csv(paste0(dataFolder, "/", file), stringsAsFactors=F)    
        minYear <- min(df$Year)
        complaintType <- substr(file,1,(nchar(file))-4)    
        tsObject <- ts(df$Complaints, start=c(minYear, 1), frequency = 12)
        data[[complaintType]] <- tsObject
    }
    data
}

In [ ]:
data <- loadData("../../data/topNComplaints")

In [ ]:
series <- data[["Complaints related to property tax "]]

In [ ]:
series

In [ ]:
tsdisplay(series)

In [ ]:
# data before 2012 are too few to consider
series <- window(series, start=c(2012, 1), end=c(2016, 6))

In [ ]:
tsdisplay(series)

Cleaning up data

Data can contain outliers, missing values, etc that can hamper the effectiveness of modeling it. Fortunately, the eGovs dataset does not have any missing datapoints, so we really only have to worry about outliers.

Outliers can be removed using the tsclean method, that comes with the forecast module. Note that the decision to remove / 'cleanse' (winsorizing) outliers should be taken after considerable thought: Removing outliers has multiple effects - changing the values themselves, the confidence intervals, etc.


In [ ]:
# create a side-by-side plot
plot(series, col="red", lty=2)
lines(tsclean(series), lty=1)
legend("topright", col=c("red", "black"), lty=c(2,1), legend=c("Original", "Cleaned"))

In the above plot, since the data exhibits a downtrend in the number of complaints, the 'outliers' that are dampened in 2013-2014 are actually not outliers at all. So in this case, cleansing data is unwarranted.

Experiments with decomposition

Aim: To decompose the series into 3 components: Trend-Cycle, Seasonal and Remainder components.

Methods used:

  • Simple moving average decomposition for finding trends
  • Sequention moving averages
  • Classical Decompsition - Additive and Multiplicative (using methods described above)
  • STL Decomposition

Finally, compute MAPE / RMSE after experimenting to evaluate effectiveness

Parmeter to select: order = how many values (left+right) should be picked for computing the average. The higher this is, the smoother the curve. Odd orders are 'symmetric', whereas even orders aren't.

Notation: 4-MA means a MA of order 4


In [ ]:
fit <- ma(series, order=3)
plot(series, main= "Complaints related to property tax",
     xlab="Time", ylab="Number of complaints", col="grey")
lines(fit, col="red")

It is clear above that a higher order needs to be used for picking out the trend, since the MA curve stil retains some of the peaks/ drops in the data


In [ ]:
old.par <- par(mfrow=c(4, 2), mar = c(2,2,2,2))
for( i in seq(3, 17, 2)) {
    fit <- ma(series, order=i)
    plot(series, main=paste0(i, "-MA of Complaints related to property tax"),
         xlab="Time", ylab="Number of complaints", col="grey")
    lines(fit, col="red")
}

par(old.par)

Increasing the order smoothes out the curve, but decreases its length. MA smoothing therefore isn't really ideal for this particular method, but gives a good intuition about how trends can be discovered in the data. One other drawback is that only odd orders are symmetric - even orders take an extra value from the left or right - which doesn't make sense most of the times. In the next section, we will take a look at moving averages of moving averages. As mentioned in [1], it can be used to make even order MA symmetric

Moving averages of moving averages


In [ ]:
# example of a 2 MA of a 4 MA, contrasted with a 4-MA
plot(series, main= "Complaints related to property tax",
     xlab="Time", ylab="Number of complaints", col="grey")
lines(ma(ma(series, 4), 2), col="red")
lines(ma(series, 4), col="blue")
legend('topleft', legend=c("2x4-MA", "4-MA"), col=c("red", "blue"), lty=c(1,1))

In [ ]:
# Let's try different 2xm-MAs to see which is appropriate
old.par <- par(mfrow=c(4, 2), mar = c(2,2,2,2))
for( i in seq(2, 16, 2)) {
    fit <- ma(ma(series, i), 2)    
    plot(series, main=paste0("2x", i, "-MA of Complaints related to property tax"),
         xlab="Time", ylab="Number of complaints", col="grey")
    lines(fit, col="red")
}
par(old.par)

It looks like 2x12MA is the smoothest in this series! As is evident in the graph above, taking multiple MAs of the series tends to give much smoother curves. But it still suffers from the problems mentioned for moving averages.

Classical decomposition

There are two forms of classical decomposition: an additive decomposition and a multiplicative decomposition. The trend-cycle component is extracted using the methods described above. This is subtracted from the series (or divided for multiplicative decomposition) to obtain the de-trended data. To obtain the seasonal component, the values are simple averaged over the time period. For instance if the time period is 12, a monthly average is taken, if it is 4, a quarterly average is taken. The remainder component is calculated by subtracting (or divining for multiplicative decomposition) the estimated seasonal and trend-cycle components.

  • As is evident, this method has the disadvantages that come with computing MAs mentioned earlier.
  • In addition to this, Classical decomposition methods assume that the seasonal component repeats from year to year. This may not be true. There might be a changing seasonal trend over the years, for instance.
  • Occasionally, the value of the time series in a small number of periods may be particularly unusual. The classical method is not robust to this.

In [ ]:
# additive decomposition - note that the extreme values are missing
plot(decompose(series, type="additive"))

In [ ]:
# multiplicative decomposition
plot(decompose(series, type="multiplicative"))

STL decomposition

Stands for "Seasonal and Trend decomposition using Loess". It is a additive decomposition method. Since it is a fairly complex method of decomposition, details are left out. Details can be found in [2]. It comes with a couple of advantages over the earlier methods. It does handle values near the beginning and end, unlike MA and classical methods, and also accounts for change in seasonal patterns.

Couple of disadvantages are: It does not automatically handle trading day or calendar variation, and it only provides facilities for additive decompositions.


In [ ]:
# need to specify at least s.window, and 
# optionally two other parameters, t.window and robust (these are the most important, there are others)
# s.window -> seasonal window, t.window -> trend window, 
# these control how rapidly the trend and seasonal components can change. Small values allow more rapid change
# you can set s.window to periodic to specify that the seasonal patterns do not change
plot(stl(series, s.window="periodic"))

In [ ]:
# if you set s.window to smaller values, you can see that the seasonal pattern changes 
old.par <- par(mfrow=c(2, 2), mar=c(3,3,3,3))
plot(stl(series, s.window=3)$time.series[, 1], main="Seasonal Component with s.window = 3")
plot(stl(series, s.window=6)$time.series[, 1], main="Seasonal Component with s.window = 6")
plot(stl(series, s.window=10)$time.series[, 1], main="Seasonal Component with s.window = 10")
plot(stl(series, s.window=12)$time.series[, 1], main="Seasonal Component with s.window = 12")
par(old.par)

In [ ]:
# changing t.window will change the 'smoothness' of the trend curve
# it is similar to order in MA trends
s.window <- "periodic"
old.par <- par(mfrow=c(2, 3))
for(tw in c(1, 3, 5, 6, 12, 14)) {
    plot(stl(series, s.window=s.window, t.window=tw)$time.series[, 2], 
         main=paste0("Trend Component with t.window = ", tw))
}
par(old.par)

In [ ]:
# TODO robust - how does it affect the modeling? 
# TODO documentation only mentions that it affects loess 
plot(stl(series, s.window="periodic", robust=T))

Forecasting using STL


In [ ]:
# stlm = stl + time-series model
# by default uses ets, can specify other models
startDate <- c(2015, 6)

train <- window(series, end=startDate)
test <- window(series, start=startDate)
fit <- stlm(train, s.window=12, method="ets")
prediction <- forecast(fit, 12)
plot(prediction, col="grey")
lines(test, col="red")

In [ ]:
# compute MAPEs
accuracy(prediction, test)

ARIMA Models

Preprocessing for ARIMA models

To fit an ARIMA model, we need to first ensure that the data have some properties. These properties are:

  • The time series must be stationary

Stationary Time Series

TODO add more details here

A stationary time series is one whose properties do not depend on the time at which the series is observed: it will have no predictable patterns in the long-term, in general. Time plots will show the series to be roughly horizontal (although some cyclic behaviour is possible) with constant variance.

[3] details several methods to check for stationarity. They are reproduced towards the end.

There are certain pre-processing steps one can use to induce stationarity in a non-stationary series. These include using a Box-Cox transform (typically with $\lambda=0$ i.e log transform), differencing, or a combination of these steps.

Transformations such as logarithms can help to stabilize the variance of a time series. Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and so eliminating trend and seasonality.

Box-Cox transforms

This transform is used if there isn't constant variance in the data i.e the data varies (significantly) with time. Box-Cox transforms are detailed in [4]. The parameter to select is $\lambda$. If $\lambda=0$, then the transformation is equvalent to a log transform - which is probably the most common form. Note that the forecast module comes with this transform inherent in almost every modeling algorithm it has, so a explicit transformation is not needed, one only needs to supply a lambda parameter during modeling. Having said that, it is always a good idea to visualize the transformed data to ensure that the transformation has the intended effect.


In [ ]:
transformed <- BoxCox(series, lambda=0)
plot(series)
plot(transformed)

In [ ]:
# try multiple values of lambda
old.par <- par(mfrow=c(3, 2), mar = c(2,2,2,2))
for( i in c(0, 0.1, 0.25, 0.5, 0.75, 1)) {
    transformed <- BoxCox(series, lambda=i)
    plot(transformed, main=paste0('lambda=', i),
         xlab="Time", ylab="Number of complaints")    
}
par(old.par)

Differencing data

Differencing: compute the differences between consecutive observations.

As well as looking at the time plot of the data, the ACF plot is also useful for identifying non-stationary time series. For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly. Also, for non-stationary data, the value of r1 (first lag) is often large and positive.

lag indicates which lag to use. By default, it is 1. differences indicates the order of the difference - the number of times the differencing operation needs to happen. By defualt this is 1, but sometimes this is set to 2 or 3 because the data is non-stationary even after first-order differencing.

The lag parameter can be set to values other than 1 if seasonal differencing is needed. This is set to the season period. For instance if the seasonal pattern spans a year, then for monthly data, we can get a seasonal differenced data by setting this to 12. Note that ARIMA models require non-seasonal data, so this needs to be carried out.


In [ ]:
# simple, first-order differencing
plot(diff(series, lag=1, differences=1))
lines(series, col="grey")
legend("topright", legend=c("Original", "Differenced"), col=c("grey", "black"), lty=c(1,1))

In [ ]:
# you can use the ndiffs method to get the number of differences required to achieve stationarity
ndiffs(series) # set this to param - differences

In [ ]:
# first order seasonal difference
plot(diff(series, differences=1, lag=7))

ACF

Before we move on to actual ARIMA, we need to understand what ACF(Autocorrelation Function) is, and how to interpret it. ACF can be a useful tool to identify the tentative parameters for ARIMA. While ACF plots help in understanding the time series in general, these plots are particularly useful for ARIMA, as we will see. Autocorrelation is correlation of a series with itself, with a lag of 1 or more. To compute it, use the Acf function (not the acf function)

Whether a correlation is significant or not can be ascertained by looking at the blue line. If the value is positive and below the blue line or negative and above the blue line, it isn't significant.


In [ ]:
Acf(series, lag.max=20)

In [ ]:
# some additional Acf plots
transformed.series <- diff(series, differences=1, lag=13)
Acf(transformed.series)

In [ ]:
plot(transformed.series)

PACF

(from [7]):

In general, the "partial" correlation between two variables is the amount of correlation between them which is not explained by their mutual correlations with a specified set of other variables. For example, if we are regressing a variable Y on other variables X1, X2, and X3, the partial correlation between Y and X3 is the amount of correlation between Y and X3 that is not explained by their common correlations with X1 and X2. This partial correlation can be computed as the square root of the reduction in variance that is achieved by adding X3 to the regression of Y on X1 and X2.

Partial auto-correlation function (PACF) is very similar to ACF. The difference is described in [5].


In [ ]:
Pacf(series, lag.max=20)

AR Models

A Autoregressive(AR) model assumes that the series is a function of $p$ prior values in the series - an $AR(p)$ model. To fit a AR model, use the Arima function, with the first value in the $order$ being $p$, and the rest 0.


In [ ]:
fit <- Arima(series, order=c(1, 0, 0))
plot(forecast(fit))

MA models

Rather than use past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model. The number of forecast errors is denoted by $q$, and the model is called a $MA(q)$ model. To fit a MA model, use the Arima function, with the third value in the $order$ being $q$, and the rest 0.


In [ ]:
fit <- Arima(series, order=c(0, 0, 12))
plot(forecast(fit))

ARIMA

A ARIMA model - Autogressive integrated moving average - is a combination of a $AR(p)$ and $MA(q)$ model, in which the data is differenced $d$ times. These 3 parameters - $(p, d, q)$ (in that order) denote an ARIMA model.


In [ ]:
## simple model with AR(1), MA(1) and 0 differences
fit <- Arima(series, order=c(1, 0, 1))
plot(forecast(fit))

Estimating $(p, d, q)$

These methods are described in detail in [6] and [7]. The following section is a summary of that, along with supporting code.

Estimating $d$

Estimating d, the order of differencing, is described in detail in [6]. An excerpt:

The first (and most important) step in fitting an ARIMA model is the determination of the order of differencing needed to stationarize the series. Normally, the correct amount of differencing is the lowest order of differencing that yields a time series which fluctuates around a well-defined mean value and whose autocorrelation function (ACF) plot decays fairly rapidly to zero, either from above or below. If the series still exhibits a long-term trend, or otherwise lacks a tendency to return to its mean value, or if its autocorrelations are are positive out to a high number of lags (e.g., 10 or more), then it needs a higher order of differencing.

Rule 1: If the series has positive autocorrelations out to a high number of lags, then it probably needs a higher order of differencing.

Rule 2: If the lag-1 autocorrelation is zero or negative, or the autocorrelations are all small and patternless, then the series does not need a higher order of differencing. If the lag-1 autocorrelation is -0.5 or more negative, the series may be overdifferenced. Beware of overdifferencing!!

Rule 3: The optimal order of differencing is often the order of differencing at which the standard deviation is lowest.

Rule 4: A model with no orders of differencing assumes that the original series is stationary (mean-reverting). A model with one order of differencing assumes that the original series has a constant average trend (e.g. a random walk or SES-type model, with or without growth). A model with two orders of total differencing assumes that the original series has a time-varying trend (e.g. a random trend or LES-type model).

Rule 5: A model with no orders of differencing normally includes a constant term (which allows for a non-zero mean value). A model with two orders of total differencing normally does not include a constant term. In a model with one order of total differencing, a constant term should be included if the series has a non-zero average trend.


In [ ]:
# revisit the Acf
Acf(series)

The Acf does not rapidly decay to zero - this may benefit from differencing. But before that, let's visualize the mean and compute the standard deviation. A moving average can also be used to visualize the variation (the higher the order, the closer it gets to the actual mean).


In [ ]:
plot(series, col="grey")
# visualize the mean
series.mean <- mean(series)
# a 2X4 MA
lines(ma(ma(series, order=2), order=4))
abline(series.mean, 0, col="blue", lty=2)
mtext(paste0("SD: ", sd(series)))

In [ ]:
# difference the series, and visualize it 
d.series <- diff(series, lag=1, differences = 1)
plot(d.series)
abline(series.mean, 0, col="blue", lty=2)
abline(mean(d.series), 0, col="red", lty=2)
mtext(paste0("New SD: ", sd(d.series), " Original SD:", sd(series)))

In [ ]:
Acf(d.series, lag.max=27)

Notice that the Acf is no longer significant at lag 1, and is patternless. However, the standard deviation has increased, and the series has a significant corelation at lag 13 - which is less than ideal.


In [ ]:
# check second order
d2.series <- diff(series, lag=1, differences = 2)
plot(d2.series)
mtext(paste0("New SD: ", sd(d2.series), " Original SD:", sd(series)))

In [ ]:
Acf(d2.series)

This is clearly overdifferenced (Rule2) - there is a high negative correlation at lag1, and the standard deviation also has increased. So clearly $d<=1$.

To confirm, it might be helpful to use a moving average to see how the values fluctuate around the mean


In [ ]:
plot(d.series, col="grey")
# a 2x4 MA
lines(ma(ma(d.series, order=2), order=4))
abline(mean(d.series), 0, col="blue", lty=2)

Estimating $p$ and $q$

Excerpt from [7]:

After a time series has been stationarized by differencing, the next step in fitting an ARIMA model is to determine whether AR or MA terms are needed to correct any autocorrelation that remains in the differenced series. By looking at the autocorrelation function (ACF) and partial autocorrelation (PACF) plots of the differenced series, you can tentatively identify the numbers of AR and/or MA terms that are needed.

The partial autocorrelations at all lags can be computed by fitting a succession of autoregressive models with increasing numbers of lags. In particular, the partial autocorrelation at lag k is equal to the estimated AR(k) coefficient in an autoregressive model with k terms--i.e., a multiple regression model in which Y is regressed on LAG(Y,1), LAG(Y,2), etc., up to LAG(Y,k). Thus, by mere inspection of the PACF you can determine how many AR terms you need to use to explain the autocorrelation pattern in a time series: if the partial autocorrelation is significant at lag k and not significant at any higher order lags--i.e., if the PACF "cuts off" at lag k--then this suggests that you should try fitting an autoregressive model of order k

Rule 6: If the PACF of the differenced series displays a sharp cutoff and/or the lag-1 autocorrelation is positive--i.e., if the series appears slightly "underdifferenced"--then consider adding an AR term to the model. The lag at which the PACF cuts off is the indicated number of AR terms.

An MA signature is commonly associated with negative autocorrelation at lag 1--i.e., it tends to arise in series which are slightly overdifferenced.

Rule 7: If the ACF of the differenced series displays a sharp cutoff and/or the lag-1 autocorrelation is negative--i.e., if the series appears slightly "overdifferenced"--then consider adding an MA term to the model. The lag at which the ACF cuts off is the indicated number of MA terms.

In addition, consider these rules:

Rule 8: It is possible for an AR term and an MA term to cancel each other's effects, so if a mixed AR-MA model seems to fit the data, also try a model with one fewer AR term and one fewer MA term--particularly if the parameter estimates in the original model require more than 10 iterations to converge.

Rule 9: If there is a unit root in the AR part of the model--i.e., if the sum of the AR coefficients is almost exactly 1--you should reduce the number of AR terms by one and increase the order of differencing by one.

Rule 10: If there is a unit root in the MA part of the model--i.e., if the sum of the MA coefficients is almost exactly 1--you should reduce the number of MA terms by one and reduce the order of differencing by one.

Rule 11: If the long-term forecasts appear erratic or unstable, there may be a unit root in the AR or MA coefficients.


In [ ]:
# determining p - look at Pacf
# with d=1
Pacf(d.series)

According to Rule 7, this plot is characteristic of a $MA(4)$ process - so let's try fitting that first, and then vary $p$ and $q$.


In [ ]:
modelArima <- function(series, order, h, testData = NULL) {
    fit <- Arima(series, order=order)
    print(summary(fit))
    predictions <- forecast(fit, h)
    # compute max and min y
    min.yvalue <- min(min(series), min(testData))
    max.yvalue <- max(max(series), max(testData))
    
    plot(predictions, ylim=c(min.yvalue, max.yvalue))
    if(!is.null(testData)) {
        lines(testData, col="red", lty=2)
        print(accuracy(predictions, testData))
    }
    # check if residuals looklike white noise
    Acf(residuals(fit), main="Residuals")
    # portmantaeu test
    print(Box.test(residuals(fit), lag=24, fitdf=4, type="Ljung"))
}

In [ ]:
modelArima(series, c(0, 1, 4), 4)

In [ ]:
modelArima(series, c(0, 1, 5), 5)

In [ ]:
modelArima(series, c(1, 1, 4), 4)

In [ ]:
modelArima(series, c(1, 1, 5), 5)

Note that all of the models above have residuals with significant ACF at certain lags, which is not ideal

Putting it all together - ARIMA

Steps:

  • Partition Data into a train and hold-out set
  • Use STL to decompose the series, and adjust it
  • If necessary, use a BoxCox transform to fix non-constant variance
  • Estimate (p, d, q) and fit an Arima model
  • Verify that residuals are white noise using a Ljung test and by taking a look at Acf

In [ ]:
stl.fit <- stl(series, s.window=8)
series.adj <- seasadj(stl.fit)
plot(series.adj)

In [ ]:
series.train <- window(series.adj, end=c(2015, 6))
series.test <- window(series.adj, start=c(2015, 7))

In [ ]:
Acf(series.train)
# looks like d=0 or d=1

In [ ]:
# look at Acf if d=1
Acf(diff(series.train))
# looks like MA(14) or MA(12)

In [ ]:
Pacf(series.train)
# looks like AR(12)

In [ ]:
modelArima(series.train, c(12, 0, 12), length(series.test), series.test)

In [ ]:
modelArima(series.train, c(1, 1, 14), length(series.test), series.test)

In [ ]:
modelArima(series.train, c(11, 1, 14), length(series.test), series.test)

In [ ]:
## finaly - try a auto.arima
fit <- auto.arima(series.train)
predictions <- forecast(fit, length(series.test))
plot(predictions)
lines(series.test, col="red", lty=2)
print(accuracy(predictions, series.test))

Acf(residuals(predictions))
print(Box.test(residuals(fit), lag=24, fitdf=4, type="Ljung"))

Exponential Smoothing


In [ ]:
series <- window(series,start = c(2012,1), end = c(2016,6))

train_data <- window(series, end=c(2015, 6))
test_data <- window(series, start=c(2015, 7), end = c(2016,6))

In [ ]:
#Having a look at the seasonal component of the entire dataset and the training set
stl.fit <- stl(series, s.window=8)
series.adj <- seasadj(stl.fit)


seasonal <- stl.fit$time.series[, 1]
seasonal_train <- stl(window(series, end = c(2015,6)), s.window = 8)[[1]][,1]

seasonal
seasonal_train

In [ ]:
#Analysing their plots
plot(seasonal)
plot(seasonal_train)

The plots looks pretty much the same. Also the data points don't differ much across the two datasets

What is exponential smoothing?

It is the method to produce smoothed time series data. In MA(moving averages), the past values are weighted equally, whereas in exponential smoothing weights are assigned in an exponentially decreasing fashion as the observation gets older, i.e. latest time period gets the maximum weight. So, based on the analysis of past data points' level, trend and seasonality, we can predict the future values.

Simplest form may be represented by \begin{aligned} \\s_{t+1}&=\alpha x_{t}+(1-\alpha )s_{t}, \end{aligned}

Which can also be written as, \begin{aligned} \\s_{t+1}&=s_{t} + \epsilon_{t} * \alpha , \end{aligned}

where 0 ≤ α ≤ 1 and ϵt is the forecast error (actual - forecast) for period t.

In other words, the smoothed statistic st is a simple weighted average of the current observation xt and the previous smoothed statistic st−1. The term smoothing factor applied to α here is something of a misnomer, as larger values of α actually reduce the level of smoothing, and in the limiting case with α = 1 the output series is just the same as the original series. Simple exponential smoothing is easily applied, and it produces a smoothed statistic as soon as two observations are available.

Values of α close to one have less of a smoothing effect and give greater weight to recent changes in the data, while values of α closer to zero have a greater smoothing effect and are less responsive to recent changes.

The unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the SSE. The errors are specified as

Three types of smoothing

* Single
* Double / Holt's Linear model
* Triple / Holt Winters

1. Single/ Simple Exponential Smoothing

The simplest of the exponentially smoothing methods is called “simple exponential smoothing” (SES). (“single exponential smoothing”.) This method is suitable for forecasting data with no trend or seasonal pattern.

y(T+1|T) = α y(T) + α(1−α) y(T−1) + α(1−α)^2 y(T−2) + ⋯ 

In [ ]:
# Simple exponential smoothing
model4 = ses(train_data, aplha = 0.001)
model1 = ses(train_data, aplha = 0.5)
model2 = ses(train_data, alpha = 0.1)
model3 = ses(train_data)
summary(model3)

In [ ]:
plot(model1) #red
lines(model2$mean, col = 2) #blue
lines(model3$mean, col = 1) #black
lines(model4$mean, col = "green") #black

2. Double / Holt's liner smoothing

Simple exponential smoothing does not take trend into account, but this method involves a forecast equation and two smoothing equations (one for the level and one for the trend).

\begin{align*} \text{Forecast equation}&& \hat{y}_{({t+1}) | {t}} &= \ell_{t} + b_{t} \\ \text{Level equation}&& \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend equation}&& b_{t} &= \beta(\ell_{t} - \ell_{t-1}) + (1 -\beta)b_{t-1} \end{align*}

α is the smoothing parameter for the level, 0≤α≤1 and β is the smoothing parameter for the trend, 0≤β≤1. As the value of β moves closer to 1, then the trend of only the most recent values are taken in account for predicting future values.


In [ ]:
#Holt's linear trend
holt1 = holt(train_data, alpha = 0.2, beta = 0.5, initial = "simple", h= 3)
holt2 = holt(train_data, alpha = 0.5, beta = 1, initial = "simple", h =3)
holt3 = holt(train_data, h = 3)
summary(holt3)

In [ ]:
plot(holt1)
lines(holt2$mean, col = "red")
lines(holt3$mean, col = "black")

Multiplicative/ Exponential trend

Simple trend model is additive in nature, where the growth rate b(t) was added to the estimated level l(t) to forecast future data points. Now, it is multiplied with the estimated level l(t). Alos, the trend in the forecast function is now exponential rather than linear, so that the forecasts project a constant growth rate rather than a constant slope.


In [ ]:
exp_trend1 <- holt(train_data, exponential = TRUE) #Multiplicative
exp_trend2 <- holt(train_data) #Additive

plot(exp_trend1)
lines(exp_trend2$mean, col = "red") #additive

Damped trend model

Holt’s linear model shows a constant trend (increasing or decreasing) indefinitely into the future. Even more extreme are the forecasts generated by the exponential trend method which include exponential growth or decline. This over forecasts the future data points and affects especially the longer forcast values.

So, we introduce a parameter that “dampens” the trend to a flat line some time in the future. There may be,

*Additive
*Multiplicative

In [ ]:
damp1 <- holt(train_data, exponential = TRUE, damped = TRUE) #Multiplicative damped
damp2 <- holt(train_data, damped = TRUE) #Additive damped
plot(damp1)
lines(damp2$mean, col = "red")

3. Triple/ Holt-Winters model

Triple exponential smoothing takes into account seasonal changes as well as trends. Seasonality is defined to be the tendency of time-series data to exhibit behavior that repeats itself every L periods.

For eg: Sales of soft drinks tend to peak during summer every year, which is seasonal.

\begin{align*} \hat{y}{t+h}{t} &= \ell_{t} + hb_{t} + s_{t-m+h_m^+} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m}, \end{align*}

Like trend, seasonality may be additive or multiplicative. Also, a fit with damped seasonality can be produced.


In [ ]:
# Holt-Winters
hw1 = HoltWinters(train_data, seasonal = "additive")
hw2 = HoltWinters(train_data, seasonal = "multiplicative")

In [ ]:
pred_hw2 = forecast.HoltWinters(hw1, h= 12)
pred_hw1 = forecast.HoltWinters(hw2, h= 12)

plot(pred_hw1)#Additive
lines(pred_hw2$mean, col = "red")#Multiplicative
lines(test_data, col = "green")

Exponential smoothing : Overview of all models

*Simple model (Additive or Multiplicative level)
*Holt's linear model(Additive/Multiplicative trend with a damping paramter)
*Holt-Winters model(Additive/Multiplicative seasonality with a damping factor)

Fitting data with ets


In [ ]:
#Preprocessing data. Removing 0 from the data
train_data[train_data==0]=0.01 

#Fitting a model with ets function

ets1 = ets(train_data, model = "AAA")
summary(ets1)

In [ ]:
forecast.ets(ets1, h=12)

In [ ]:
accuracy(forecast.ets(ets1, h=12))

In [ ]:
#Ljung Box test - One of the checks to perform stationarity of TS data
Box.test(ets1$residuals, lag = 20, type = "Ljung-Box")
p_value <- Box.test(ets1$residuals, lag = 20, type = "Ljung-Box")$p.value
Acf(ets1$residuals)

Analysing different models


In [ ]:
series.adj

#Removing the negative value, by adding an equal bias of opposite sign.
min_ts_value <- min(series.adj)
bias_value <- (-1*min_ts_value) + 1

print(bias_value)

In [ ]:
series.adj <- series.adj + bias_value
train_data <- window(series.adj, end=c(2015, 6))
test_data <- window(series.adj, start=c(2015, 7), end = c(2016,6))
forecast_values <- length(test_data)

train_data[train_data==0]=0.01 
print(train_data)

#Using the seasonal component of the entire dataset as the average seasonal component
seasonal <- stl.fit[[1]][,1]

In [ ]:
all_types = c("ANN","AAN","AAA","ANA","MNN","MAN","MNA","MAA","MMN","MNM","MMM","MAM")

# For eg: AAA -> additive level, additive trend and additive seasonality
# ANN -> No trend or seasonality

In [ ]:
all_fit <- list()
test_models <- list()

print("Fitting various models: ")
for (bool in c(TRUE,FALSE)){
    for (model_type in all_types){

        if(bool & substr(model_type,2,2)=="N"){
            next
        }
    test_model = ets(train_data, model = model_type,damped = bool)
    #Box.test(test_model$residuals, lag = 20, type = "Ljung-Box")$p.value
    all_fit[[paste0("ETS Model: ",model_type,", Damped: ",bool)]][1] <- accuracy(forecast.ets(test_model,h=forecast_values), test_data)[10]
    all_fit[[paste0("ETS Model: ",model_type,", Damped: ",bool)]][2] <- 100*(Box.test(test_model$residuals, lag = 20, type = "Ljung-Box")$p.value)

        test_models[[paste0("ETS Model: ",model_type,", Damped: ",bool)]] <- test_model

        print(test_model$method)
        print(accuracy(forecast.ets(test_model,h=12), test_data)[10])
        print("")

        #Excluding the models which has auto correlated residuals @ 10% significance level

    }
}

In [ ]:
head(all_fit)

In [ ]:
level_of_significance = 10

    #proper_models <- all_fit[sapply(all_fit,function(x)x[2]<= level_of_significance)]
    proper_models <- all_fit
    if(length(proper_models)==0){
        print("None of the model satisfies - Ljung-Box test; Model with least 3 p values taken")
        p_values <- sapply(all_fit, function(x)x[2])
        proper_models <- all_fit[order(p_values)][1:3]
    }

    best_mape <- min(sapply(proper_models,function(x)x[1]))
    best_model <- names(which.min(sapply(proper_models,function(x)x[1])))

    #print(paste0("Complaint: ",names(TS_data)))
    print(paste0("Best Model:",best_model))
    print(paste0("Best Mape: ",best_mape))

In [ ]:
#Find top n models

top_models <- c()
Top_n <- 3

if(length(proper_models)<3){Top_n <- length(proper_models)}

top_mape_val <- proper_models[order(sapply(proper_models, function(x)x[1]))][1:Top_n]
top_models <- names(top_mape_val)

In [ ]:
top_models

In [ ]:
## Function for finding the average of seasonal components
period_stat <- function(ts_data_in, type = 1, start_value, years){
#type 1: sum
#type 2: mean

freq <- frequency(ts_data_in)
len <- length(ts_data_in)

freq_vector <- numeric(0)
freq_sum <- numeric(0)
vec <- numeric(0)
sum_vec <- numeric(0)

start_val <- start(ts_data_in)

ts_data_in <- c(rep(NA,start_val[2] - 1),ts_data_in)

max_limit <- ceiling(len/freq)
    for(i in 1:max_limit){
    
    vec <- ts_data_in[(((i-1)*freq)+1):(((i-1)*freq)+freq)]
    freq_vector <- as.numeric(!is.na(vec))
    vec[is.na(vec)] <- 0
    
    if(i == 1){
    sum_vec <- vec
    freq_sum <- freq_vector
    }else{
    sum_vec <- sum_vec + vec
    freq_sum <- freq_sum + freq_vector
    }
    }

final_ts <- numeric(0)
if(type == 1)
{
    final_ts <- sum_vec
}else if(type == 2) {

    final_ts <- (sum_vec/freq_sum)
} else {
    stop("Invalid type")
}


return(ts(rep(final_ts,years),frequency = freq, start = start_value ))

}

In [ ]:
seasonal_mean <- period_stat(seasonal,2,c(2012,1),years = 7)
seasonal
seasonal_mean

#Analysing how different seasonal and seasonal mean values are
plot(seasonal_mean)
lines(seasonal, col = "red")

In [ ]:
#plot(forecast.ets(test_models[[top_models[1]]],h=12))
plot(series.adj + seasonal, col = "black")
lines(test_data + seasonal, col = "blue")
lines(forecast.ets(test_models[[top_models[1]]],h=12)$mean + seasonal_mean, col = "red")

print(forecast.ets(test_models[[top_models[2]]],h=12)$mean)
print(test_data)
accuracy((forecast(test_models[[top_models[2]]],h=12)),(test_data ))
#print(forecast.ets(test_models[[top_models[2]]],h=12)$mean + test_ses)
#print(test_data + test_ses)

In [ ]:
#Adding the bias value which was added to overcome the negative values
ES_series_bias <- series.adj - bias_value
test_series_bias <- test_data - bias_value
forecast1_bias <- forecast.ets(test_models[[top_models[1]]],h=12)$mean - bias_value
forecast2_bias <- forecast.ets(test_models[[top_models[2]]],h=12)$mean - bias_value
forecast3_bias <- forecast.ets(test_models[[top_models[3]]],h=12)$mean - bias_value

#Adding back the seasonal value from stl decomposition
ES_value <- ES_series_bias + seasonal
test_series <- test_series_bias + seasonal

forecast1 <- forecast1_bias + seasonal_mean
forecast2 <- forecast2_bias + seasonal_mean
forecast3 <- forecast3_bias + seasonal_mean

accuracy(test_models[[top_models[1]]])
accuracy(test_models[[top_models[2]]])
accuracy(test_models[[top_models[3]]])

ES_series_bias
test_series_bias
forecast1_bias

test_series
forecast1

#seasonal
#seasonal_mean

In [ ]:
#Checking the MAPE values with original data
print(paste0("Top model: ", top_models[1]))
accuracy(forecast1,test_series)
print(paste0("Top model: ", top_models[2]))
accuracy(forecast2,test_series)
print(paste0("Top model: ", top_models[3]))
accuracy(forecast3,test_series)

#accuracy(test_data, forecast.ets(test_models[[top_models[3]]],h=12)$mean)
forecast2
test_series

plot(ES_value)
lines(forecast2, col = "blue")
lines(test_series, col = "red")

In [ ]:
#Checking the MAPE values with original data
print(paste0("Top model: ", top_models[1]))
accuracy(forecast1,test_series)
print(paste0("Top model: ", top_models[2]))
accuracy(forecast2,test_series)
print(paste0("Top model: ", top_models[3]))
accuracy(forecast3,test_series)

#accuracy(test_data, forecast.ets(test_models[[top_models[3]]],h=12)$mean )

In [ ]:
#Ljung Box test - One of the checks to perform stationarity of TS data
#Top model
print(top_models[1])
Box.test(test_models[[top_models[1]]]$residuals, lag = 20, type = "Ljung-Box")
p_value <- Box.test(test_models[[top_models[1]]]$residuals, lag = 20, type = "Ljung-Box")
Acf(test_models[[top_models[1]]]$residuals)

In [ ]:
#Ljung Box test - One of the checks to perform stationarity of TS data
#Top second model
print(top_models[2])
Box.test(test_models[[top_models[2]]]$residuals, lag = 20, type = "Ljung-Box")
p_value <- Box.test(test_models[[top_models[2]]]$residuals, lag = 20, type = "Ljung-Box")
Acf(test_models[[top_models[2]]]$residuals)

In [ ]:
#Ljung Box test - One of the checks to perform stationarity of TS data
#Top Third model
print(top_models[3])
Box.test(test_models[[top_models[3]]]$residuals, lag = 20, type = "Ljung-Box")
p_value <- Box.test(test_models[[top_models[3]]]$residuals, lag = 20, type = "Ljung-Box")
Acf(test_models[[top_models[3]]]$residuals)

In [ ]:
plot(ES_value,col = "black") #Original data set
lines(test_series, col = "blue") #Original test data
lines(test_series_bias + seasonal_mean, col = "black") #Deseasonlised data with average seasonal component applied
lines(forecast1, col = "red") #Top model
lines(forecast2, col = "green") #Top second model
lines(forecast3, col = "yellow") #Top third model

In [ ]: